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INFERENCE USING SHAPE-RESTRICTED REGRESSION SPLINES 

By Mary C. Meyer 
Colorado State University 

Regression splines are smooth, flexible, and parsimonious non- 
parametric function estimators. They are known to be sensitive to 
knot number and placement, but if assumptions such as monotonicity 
or convexity may be imposed on the regression function, the shape- 
restricted regression splines are robust to knot choices. Monotone 
regression splines were introduced by Ramsay [Statist. Sci. 3 (1998) 
425-461], but were limited to quadratic and lower order. In this paper 
an algorithm for the cubic monotone case is proposed, and the method 
is extended to convex constraints and variants such as increasing- 
concave. The restricted versions have smaller squared error loss than 
the unrestricted splines, although they have the same convergence 
rates. The relatively small degrees of freedom of the model and the 
insensitivity of the fits to the knot choices allow for practical infer- 
ence methods; the computational efficiency allows for back-fitting of 
additive models. Tests of constant versus increasing and linear versus 
convex regression function, when implemented with shape-restricted 
regression splines, have higher power than the standard version using 
ordinary shape-restricted regression. 

1. Introduction. We consider the regression model 
(1) yi = f{xi) +aei, i = l,...,n, 

where the errors are i.i.d. Nonparametric regression methods provide esti- 
mates for / using minimal assumptions, and are appropriate when a para- 
metric form is unavailable. Many methods assume only some sort of smooth- 
ness; three of the most widely used of these are the kernel smoother, the 
smoothing spline, and regression splines. These require user-specified choices 
as bandwidth, or smoothing parameter, or number and placement of knots. 
If the fits are not robust to these choices, inference about the regression 
function is problematic. 
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Fits using only shape restrictions such as monotonicity or convexity do not 
require user-defined parameters, but are typically not smooth, nor are the fits 
parsimonious, in that the model degrees of freedom is in some sense large. 
Some inference methods have been developed: for testing constant versus 
increasing regression function [see Robertson, Wright and Dyskstra (1988), 
hereafter RWD] and for linear versus convex regression function [see Meyer 
(2003)], test statistics with exact distributions under the null hypothesis and 
normal errors assumption have been derived. 

Several methods combining smoothing and shape restrictions have been 
proposed. Mammen (1991) investigated the asymptotic behavior of the mono- 
tonized kernel estimator, and alternatively, the kernel-smoothed monotone 
regression estimator, showing that either obtains the n~^/^ pointwise con- 
vergence rate of the original kernel estimator. 

The constrained smoothing splines are more challenging to obtain. The 
expression 

n f, 

(2) Y.iy^-m)]' + x [f^'\x)fdx 

i=i -^^ 

is to be minimized over the set of / satisfying the constraints. The constant A 
is called the "smoothing parameter" because larger values result in smoother 
fits. Tantiyaswasdikul and Woodroofe (1994) characterized the monotone 
smoothing splines for k = 1. The natural cubic smoothing spline minimizes 
the expression (2) for k = 2, but although it is relatively easy to impose 
shape restrictions at the observed x-values, ensuring that the restrictions 
hold between the observations is more difficult. For monotone cubic interpo- 
lation, it is known that additional knots must be placed between the obser- 
vations [Fredenhagen, Oberle and Opfer (1999)], but the exact placement of 
these knots is not understood. Delecroix, Simioni and Thomas- Agnan (1995) 
demonstrated through simulations that imposing shape restrictions on top 
of smoothing leads to substantial reductions of squared error loss for moder- 
ately sized samples and many choices of underlying function, error variance, 
and shape. The monotone smoothing splines were shown by Mammen and 
Thomas- Agnan (1999) to obtain the optimal n~P^^'^^~^^^ convergence rate, 
where p is the maximum of k and the order of the polynomial. 

In this paper the estimation of / using shape-restricted regression splines 
is discussed, and inference methods proposed. Regression splines are sim- 
ple and straight-forward; a set of basis functions are provided which act as 
the regressors in an ordinary least-squares model. The basis functions are 
smooth, often piecewise polynomials of degree d between the user-specified 
knots, with d — 1 continuous derivatives. The optimal convergence rate is 
attained; Huang and Stone (2002) provide a nice discussion of the trade-off 
between estimation error and approximation error in choosing the number 
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of knots. The set of linear combinations of the basis functions typically pro- 
vides more than enough flexibility to fit a scatterplot; in fact, in the absense 
of shape restrictions, the flexibility for a rather small number of knots is so 
great that most of the regression spline literature concerns guarding against 
the over-fitting of data. Friedman and Silverman (1989) discussed knot place- 
ment of unconstrained polynomial splines, based on forward-backward model 
selection. They imposed a minimum span for possible knot locations, based 
on the idea that the smoother will "follow runs" of sequential positive or 
negative errors, and result in over-fitting. Eilers and Marx (1996) proposed 
using a rather large number of knots, but including a penalty term to reduce 
the flexibility of the regression estimator and guard against over-fitting. 

These techniques are not necessary for the shape-restricted regression 
spline. Typically, when assumptions about both shape and smoothness are 
warranted, the fits are robust to the choice of smoothing parameters. The 
monotonicity constraints, for example, do not allow peaks and valleys in the 
fit, although rounded "steps" may be present, and convexity (or concavity) 
constraints disallow any sort of wiggling. In other words, the shape restric- 
tions themselves provide some smoothing, and obviate cures for over-fitting. 

An example data set involving age and income data is shown in Figure 1. 
These data were used as an example in Ruppert, Wand and Carroll (2003) 
(hereafter RWC) and represent a sample of 205 Canadian workers, all of 
whom were educated to grade 13. The relationship between log(income) 
and age is to be modeled nonparametrically. In plot (a) there are three 
examples of penalized regression splines. The solid line represents the cross- 
validation choice of the smoothing parameter; the dashed line has a larger 
parameter, and the dot-dash line has a smaller. Smaller penalty parameters 
allow the fit to be steeper at the left-hand side, but allow for wiggles toward 
the right-hand side. The cross-validation choice still has a dip at about age 
41-42. If we believe that the true relationship between log(income) and 
age should not have a dip, we might assume the relationship is convex. In 
plot (b) we see the cubic regression splines, constrained to be convex, for 
three choices of interior knots. The fits are very close to one another, almost 
indistinguishable, and have no trouble with the steep slope at the left. 

A second motivating example uses the "onion" dataset from RWC. When 
onions are planted more densely in a field, it is expected that the yield 
per plant will decrease, and we suppose here that the log of the yield is also 
convex in the density of the planting. The smoothing splines presented in plot 
(a) of Figure 2 are constrained to be decreasing and convex (at the observed 
rr-values), for three choices of the smoothing parameter. The solid curve 
represents the cross-validation choice. The parameter for the dotted curve 
is larger, and the fit is close to a straight line. The smoothing parameter 
is smaller for the dashed curve, which rises more sharply at the left. The 
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decreasing convex cubic regression splines are shown in (b), for two, four, 
and six interior knots. Again, these fits he almost on top of each other. 

Robustness of nonparametric function estimator to user inputs is impor- 
tant for inference, to ensure that different choices will not produce different 
answers. The rest of the paper is organized as follows. The algorithm for the 
computation of shape-restricted regression splines is described in Section 2. 
Hypothesis tests for constant versus increasing and linear versus convex re- 
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Fig. 1. Age and income data for a sample of Canadians, (a) Penalized cubic splines 
for three values of the penalty parameter; the solid curve represents the cross-validation 
choice, (h) Cubic regression splines constrained to be concave, with three (dashed), five 
(solid) and seven (dot-dash) interior knots. 
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Fig. 2. Comparison of fits to the onion data, (a.) The natural cubic smoothing spline 
fits constrained to be decreasing and convex, for three choices of smoothing parameter; the 
solid line is the cross-validation choice, (h) The decreasing convex cubic regression spline, 
for three choices of knots. 
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gression function are discussed in Section 3, where an exact test statistic is 
obtained. Degrees of freedom and estimation of model variance are discussed 
in Section 4. 



2. Computing the estimator. Ramsay (1988) introduced monotone re- 
gression spHnes to fit a scatterplot of data {xi,yi), for i = 1, . . . ,n, where 
the ordered. For regression spHnes of order k, choose I grid points 

tk+i, tk+i, and define knots xi = ti = ■ ■ ■ = tk < ■ ■ ■ < U+k+i = ■■■ = ti+2k = 
Xn - Then the number of M-spHne basis functions is m = l + k; these are given 
recursively as follows. Order 1 M-splines are the piecewise constant (step 
functions) 

j^W (x) ^ J - — ' for ti < X < ti+i , 




otherwise, 

for i = 1, . . . ,1 + 1. Order k M-splines are computed from the lower orders: 

k[{x - U)Mt'\x) + (t,+, - x)Mt^-'\x)] 

Mf^(x) = <| ik-l)iti+k-ti) 
for ti<x< ti+k, 
0, otherwise. 

Finally, the I-splines are 

(3) lf'\x)= r M'>''\u)du for i = l,...,/ + fc = m, for x€ [xi,xj. 

The regression function is estimated by a linear combination of the ba- 
sis functions and the constant function. To constrain the estimator to be 
monotone, the coefficients of the basis functions must be nonnegative (the 
coefficient of the constant function is not constrained). 

For convex C-splines, the /-splines are integrated, to get basis functions 
that are both increasing and convex. In particular, 

(4) c\^\x) = / l\^\u)du for i = 1, ...,/ + A: = m for X € [xi, x„]. 

A convex regression function is estimated using linear combinations of the 
basis functions with nonnegative coefficients, plus an unrestricted linear 
combination of the constant function and the identity function ^(x) = x. 
If the underlying regression function is both increasing and convex, we re- 
strict the coefficient on the identity function also to be nonnegative. 

Define the set of vectors in R" containing the values of the jth ba- 
sis function, evaluated at the x-values. For fcth-order monotone regression 
splines, let o"^ = /j'^^(xj), for j = 1, . . . ,m, and i = 1, . . . ,n. For the convex 
case, simply substitute the C-spline basis functions for the Ij. Let V be the 
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Fig. 3. Spline basis functions (curves) using three equally spaced interior knots, along 
with edge vectors (with values marked as dots) for a data set with n = 40 equally spaced 
design points. The knot positions are marked with "X. " (&) Quadratic monotone, scaled so 
that the vectors have mean zero, (h ) Cubic convex, scaled so that the vectors are orthogonal 
to both the one-vector and the x-vector. 



linear space contained in the constraint set; for the monotone and monotone- 
convex cases, V = C{1), and for convex constraints, V = C{l,x), where C 
denotes "hnear space spanned by," 1 = (1, . . . , 1)', and x = (xi, . . . , Xn)' ■ 
A set of "generating vectors" that are orthogonal to V are 

= (T^ -U{a^\V), 

where 11 is a projection operator. Because the cr and v vectors form a lin- 
early independent set, then the d vectors are also linearly independent. The 
constraint set C can then be characterized by 

C = 1^:0 = v + Yy^j^^i where bj >0,j = l,...,m, and ^; G l/j. 

For y = 6 + e, the least-squares estimator 6 minimizes ||y — 0|p over 6 gC, 
where the notation is \\a f = Eof. The set of nonnegative linear combina- 
tions of the S vectors is called the "constraint cone" fi, and 9 can be found 
by projecting onto and V separately, then adding the projections. A de- 
piction of the cone edges for piecewise quadratic increasing constraints using 
three interior knots and n = 40 is shown in Figure 3, plot (a). The piecewise 
cubic edge vectors for the convex case are shown in plot (b). The knots are 
marked on the plots with "X." The basis functions and basis vectors are 
easily modified for use with concave constraints, or increasing concave, and 
so on. 
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The third-order (quadratic) J-spHnes defined in (3) are "proper" set ba- 
sis functions, in that the coefficients of the piecewise quadratic monotone 
basis functions being nonnegative is a necessary and sufficient condition for 
the hnear combination to be nondecreasing. This can be seen by observing 
that, at each knot, exactly one basis function has a positive first deriva- 
tive. Similarly, the first and second order /-splines form proper sets, but 
the cubic /-splines do not. A linear combination of cubic /-splines might be 
nondecreasing while one or more of the coefficients is negative, so that the 
least-squares estimator might lie outside of C. The lack of a cubic version of 
the monotone splines is a drawback when the user would like a smooth first 
derivative. An approach to obtaining a solution is presented at the end of 
this section. 

For the convex cubic splines defined in (3), we observe that, at each knot, 
exactly one basis function has a positive second derivative, so that the C- 
splines form proper sets of basis functions up to order four. 

Ramsay (1988) provides a gradient-based algorithm for finding the least- 
squares solution for the /-splines. This converges to the solution in "infinitely 
many" steps, meaning that the algorithm produces a sequence of estimates 
that get closer to the true least-squares estimate, then stops when the gradi- 
ent satisfies a convergence criterion. The algorithm proposed here finds the 
true solution in a small number of steps, by taking advantage of the fact 
that the constraint set is a closed convex polyhedral cone in M". 

Because C is convex, the minimizer 6 of ||y — 0|p over 6 £ C is unique, 
and the necessary and sufficient condition is {y — 0,6 — 6) > 0, for all G C 
(RWD). Here the notation {a,b) denotes the inner product ^aihi. Because 
C is a cone, the condition may be written as 

(5) {y-e,e) = o 

and 

(6) {y-9,9)<0 for all OeC. 

Subsets of the generators define "faces" of the constraint cone. Any 
J C {1, 2, ... , m} defines a set T{J) = J2jej bjS^ , where bj > for j £ J. The 
constraint cone itself is a face with J = {1, . . . ,m}, and the origin is a face 
with J equal to the empty set, so that there are 2™ faces. The projection 
onto the cone Q will land on one of these faces, and in fact is a projection 
onto a linear subspace; for a proof of the following proposition, see Meyer 
(1999). 

Proposition 1. Let J be the subset o/{l,...,m} such that the unique 
minimizer of \\y — 0|p is 6 = v + bjS-' , where the bj are strictly positive 

for j G J. Then 6 is the projection of y onto the linear space spanned by 
vectors in V and the 5-' , j G J. 
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The mixed primal-dual bases algorithm by Fraser and Massam (1989) and 
the hinge algorithm by Meyer (1996) are efficient ways to determine the set 
J. In either scheme, an initial set Jq is proposed, and ordinary least squares 
projection onto the face with edges indexed by J is determined. Edges are 
added or removed one-by-one (the rules are different for the two algorithms) 
until the correct J [as determined by (1) and (2)] is found. A QR decom- 
position of the "design matrix" allows the new projection to be obtained 
from the old projection with less computation. To demonstrate the speed of 
the algorithm, 10,000 datasets of size n = 100, simulated using yi = + ei, 
with X values equally spaced on (0,2) and i.i.d. standard normal errors. A 
piecewise quadratic monotone spline was fit with the hinge algorithm to each 
dataset, using four interior knots. At most ten iterations of the algorithm 
were needed; the modal number of iterations was five. When n is increased 
to 500 and using six interior knots, the maximum number of iterations (out 
of 10,000 simulated data sets) was twelve, with a mode of seven. 

Consistency and rates of convergence. The following proposition states 
that the shape-restricted version of the regression spline has smaller squared 
error loss than the unrestricted version, when the true regression function 
satisfies the shape assumptions. Therefore, the shape-restricted regression 
spline is consistent under the same regularity conditions as the ordinary 
piecewise polynomial regression spline, and the rate is at least as good. The 
proof is straight- forward. 

Proposition 2. Let S he the linear space spanned by the S vectors and 
the vectors in V. Let 9i = f{xi), and assume the shape restrictions hold for 
f. Let y be the unconstrained projection of y onto S, and recall that is 
the projection of y onto C. Then 

\\o-o\\ < \\y-e\\, 

with equality only if = y. 

Hwang and Stone (2002) show that for the unrestricted case and "bounded 
mesh ratio," the asymptotically optimal number of knots is I ^ n^^^'^'P~^^\ 
where p is the order of the polynomial pieces. This choice allows the estima- 
tor to attain the pointwise convergence rate Op(n~P/^^^''~^)). 

Choice of knots. Shape restrictions impose some degree of smoothness 
by themselves. Combining shape restrictions with smoothing will result in 
estimators that are typically more robust than smoothing only, especially if 
a convexity assumption is warranted. The asymptotically optimal n"^/^^^"'"^) 
rounds to two or three for n up to about 500, but if the underlying regression 
function shows a lot of variation such as rapid rises, more knots might be 
needed to follow the data adequately. This is demonstrated in the following 
simulations, which give some insight into knot choices. 
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Table 1 

Comparison of the root mean squared error loss for penalized splines with quadratic 
regression splines constrained to be increasing. Data were simulated using the indicated 
f, cP' = 1, and x-values equally spaced on (0,1). PSPL indicates the penalized regression 
spline with k = n/3 knots, and cross-validation selection of the smoothing parameter; MR 
is ordinary monotone regression, MSPL2 is quadratic monotone regression spline with 
two interior knots, MSPL4 is defined similarly 



f{x) = Ax f{x) = 5exp(10a; - 5)/[l + exp(10a; - 5)] 



n 


PSPL 


MR 


MSPL2 


MSPL4 


PSPL 


MR 


MSPL2 


MSPL4 


40 


0.36 


0.43 


0.31 


0.34 


0.41 


0.52 


0.47 


0.35 


80 


0.26 


0.34 


0.23 


0.26 


0.33 


0.41 


0.41 


0.25 


200 


0.17 


0.25 


0.15 


0.17 


0.22 


0.27 


0.26 


0.16 



Goodness of fit. We compare the root mean squared error loss of the 
constrained regression sphnes with that of the penahzed regression spUne 
and the standard shape-restricted regression estimators using simulations. 
The squared error loss for the estimate fj from the jth. simulated dataset is 

1 

SEL, = -Y,[f^{x,)-f{x,)f, 

1=1 

and the square roots of the average SEL are reported for 10,000 simulated 
datasets. 

For monotone constraints, we use two underlying functions: /(x) = 4x 
and f{x) = 5exp(10x — 5)/(l + exp(103; — 5)) over (0, 1), both with unit 
model variance. The first varies steadily, but the latter has a steep increase 
in the middle of the range of the data, which makes it difficult for the 
quadratic monotone spline if there are too few knots. The fit with two in- 
terior knots is not flexible enough to fit the steep rise adequately, although 
using four interior knots gives a substantial improvement. In summary, the 
4-knot monotone spline does best for the sigmoidal data for all three sam- 
ple sizes, followed by the penalized spline. The 2-knot spline does not do 
better than the unsmoothed monotone regression for this choice of function. 
For the linear function, the 2-knot spline performs best, followed by the 
4-knot spline and the penalized spline, which out-perform the unsmoothed 
monotone regression. 

The advantage of using monotone convex constraints when available is 
illustrated using the regression functions f{x) = Ax and /(x) = 4x^. In each 
case, the smoothed constrained fits do best, followed by the unsmoothed, 
constrained regression. The unconstrained fit is last in each case. 

Weighted regression. Suppose we have the model y = 6 + ae, where 
cov(e) = A for a known positive definite A. The weighted shape-restricted 
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regression spline fit is easy to obtain with a transformation of the data and 
of the constraint cone. We transform the model to the i.i.d. case by multi- 
plying the model through by A~^/'^, the inverse of the Cholesky decomposi- 
tion of the covariance matrix, to get y = 6 + ae, and similarly transforming 
= A~^/'^v, j = 1, . . . ,r and 6'' = A~^^'^5^ . The projection is performed 
using the transformed model, then the answer is subjected to the reverse 
transformation. 

This is useful when there are multiple observations at some x-values. 
For example, the ages in the Canadian age and income dataset are integers 
ranging from 21 to 65, with one to twelve observations at each age. Because 
we want to have distinct x-values, we average the incomes for each age 
group and weight the regression according to the number of observations 
comprising the average. The fits shown in Figure 1(b) were computed using 
the weighted model. In Figure 4 we see the weighted residuals plotted against 
age for the fit with five interior knots. There is not so much of a "fanning 
out" pattern as might be expected from the scatterplot, partly because more 
observations were taken for the younger ages. In fact, if the outlier at age 
57 is ignored, there is hardly any pattern to the residuals. 

Additive models. The computational efficiency facilitates "back-fitting" 
additive models of the form 

y = e + Df3 + e, 

for an n X g design matrix D and r-dimensional parameter vector /3. For 
a given P, £ C may be found to minimize — Df3) — using the 
projection algorithm. The space is searched for the optimal (3. 

A simple example is the two-parallel curves model. To illustrate, we use 
the onion data set again, with two locations of farms indicated as shown in 
Figure 5. The decreasing convex parallel cubic regression splines are shown 
for three choices of knots. 

Table 2 

Same as for Table 1 except MCR is ordinary monotone- convex regression, MCSPL2 is 
cubic monotone- convex regression spline with two interior knots, MCSPL4 is defined 

similarly 



n 




/( 


x) = 4x 








) = 4a;2 




PSPL 


MCR 


MCSPL2 


MCSPL4 


PSPL 


MCR 


MCSPL2 


MCSPL4 


40 


0.36 


0.30 


0.21 


0.29 


0.36 


0.31 


0.27 


0.27 


80 


0.26 


0.22 


0.19 


0.20 


0.25 


0.23 


0.20 


0.21 


200 


0.16 


0.14 


0.12 


0.13 


0.17 


0.16 


0.14 


0.14 
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Fig. 4. Weighted residuals for the fit to the Canadian age and income data. 
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Fig. 5. Log (yield) of onion plants against density of planting, for two locations. The 
Pumong Landing data are shown as triangles and the Virginia data arc indicated with 
circles. The cubic regression splines constrained to he decreasing and convex are shown for 
two, four, and six equally spaced interior knots. 



All distinct x-values are used to define the spline basis functions. Given a 
choice of /3, the least-squares parallel curves can be computed though a cone 
projection. The golden section search method [see Press et al. (1992), Sec- 
tion 10.1] is used to find the parameter (3 representing the location effect, or 
the distance between the curves, that minimizes the overall sum of squared 
residuals. The estimates for /3 do not vary substantially with choice of knots. 
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For two, three, and four interior knots, we estimate that the Pumong Land- 
ing log (yield) is 0.335 points higher than for the Virginia onions, and for 
five and six interior knots, the estimated difference is 0.338. 

Toward a monotone piecewise cubic regression spline. The necessary and 
sufficient conditions that a cubic function be monotone on an interval can 
not be written as a set of linear inequality constraints, so a proper set of 
basis functions for the monotone cubic regression splines does not exist. 
Fritsch and Carlson (1980) outline the conditions and provide an algorithm 
for the monotone piecewise cubic interpolation to monotone data, which will 
be part of the method proposed here. 

The set of nondecreasing vectors in form a polyhedral convex cone that 
can be characterized by Qm = '■ AO > 0}, where the nonzero elements of 
the (n — 1) X n matrix A are Ai^i = — 1 and Ai^i^i = 1, z = 1, . . . , n — 1. The 
cubic M-spline vectors span a linear subspace of , and the intersection of 
this subspace with forms a polyhedral convex subcone The edges of 
this cone can be used as basis vectors for the monotone cubic splines. This 
constrains the fit to be increasing at the observations, that is, < • • • < 
but between observations the piecewise cubic function may be decreasing. 
Hence, we propose to use these basis vectors to get the least-squares solu- 
tion 6, then, if necessary, find the monotone interpolating cubic spline of 
Fritsch and Carlson (1980) to get /. This interpolation does not minimize 
the integral of the squared second derivative of /, and, in fact, there is not 
a unique monotone cubic interpolation, but the optimal convergence rate is 
attained for any monotone interpolation. 

The general method for finding the edges for the intersection of a convex 
cone with a linear subspace is as follows. Suppose a basis for the linear 
subspace is represented by the columns of the n x k matrix V, and the 
cone is defined by the m x n constraint matrix A, that is, AO > 0. Then 
the vector v = Vc is in the intersection if AVc > 0, so that the constraint 
matrix for the coefficient vector is the m x k matrix AV . Suppose the row 
space of AV has dimension q <k. The edges of the cone are found as follows. 
If ri, . . . ,rg„i are linearly independent rows of AV, vq G M'^ is orthogonal 
to the space spanned by ri,...,rg„i, and AVtq > 0, then tq is an edge. 
Furthermore, all edges are of this form. The proof of this claim can be found 
in Meyer (1999). 

Using this method, we can find all linear combinations of the M-splines 
that are increasing at the observations. The number of edge vectors for the 
sub-cone can be quite large, which is typical for the case of more constraints 
than dimensions [see Meyer (1999) for more examples]. For three interior 
knots and n = 40, sixteen of the 233 edges are shown in Figure 6. The dots 
are the values of the edge vectors, and the lines are the piecewise cubic 
spline functions. The functions appear to be increasing, but some are not. 



SHAPE-RESTRICTED REGRESSION SPLINES 



13 



For example, the top left curve is increasing and then decreasing between the 
last two observation points. The cone formed by these edges is larger than 
that of the cubic /-splines, and any vector that is a linear combination of 
the cubic M-spline basis vectors and satisfies 9i < ^j+i, z = 1, . . . , n — 1, is in 
the cone. Therefore, a monotone cubic spline interpolation of the projection 
of y onto the cone coincides with the least-squares solution and attains the 
optimal convergence rate. The piecewise cubic monotone spline is compared 
with the piecewise quadratic in Figure 7, for the same sigmoidal function 
used in the goodness of fit simulations. The cubic version is more flexible 
with fewer knots, and is able to follow the steeper rise in the data. 

3. Hypothesis testing. One practical use for nonparametric function es- 
timation is in testing proposed parametric models. The simplest examples 
are the test Hq : f = c versus Ha : / is increasing, and the test of linear ver- 
sus convex regression function. When the alternative fit is the ordinary (un- 
smoothed) shape-restricted estimator, a likelihood ratio approach yields a 
test statistic distributed as a mixture of beta random variables. See RWD, 
Chapter 2, for the monotone case and Meyer (2003) for the convex case. If 
we include the "smooth" assumption, and use the shape-restricted regres- 
sion spline for the alternative fit, we get again a test statistic with a mixture 
of betas distribution. To show this, some characterizations of the constraint 
and polar cones are needed. 

Let S be the linear space spanned by the vectors d\ j = 1, . . . ,111. Then 
the polar cone is defined as all vectors in S making obtuse angles with all 
vectors in Q: 

Q° = {peS: (p, 6) < 0, for all 6 G Q}. 

Let f^, . ■ . ,7"^ be the rows of the matrix [— (A'A)^^A'], where the columns 
of the nx m matrix A are the (5-' . Then (7-^ , 5*) = for i 7^ j, and {'j^ , ^"') = 
— 1, and by construction, -y^ G Q,^ and the 7-' span the same space as the 
vectors. The proof of the next result is straight-forward. 

Proposition 3. The vectors 7-' are generators of Q,^ , that is, 
= < p : p = ^ bj-f^ , where bj > 0, for all j = 1, . . . , m > . 

Let r be the dimension of V, so that r = 1 for monotone or monotone- 
convex, and r = 2 for convex constraints. We may define n — m — r linearly 
independent vectors w that are orthogonal to the S vectors and to V. The 
proof of the next proposition is given in the Appendix. 
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Fig. 6. Some of the edge vectors for the piecewise cubic regression sphne, monotone at 
the X -observations, for three interior knots and n = 40. 

o 




I "T 1 \ ! 1 r 

0.Q 0.2 0.4 0.6 D.S 1.0 



Fig. 7. Comparing the piecewise quadratic monotone spline (dashed) with the piecewise 
cubic (solid), both using two interior knots and n — 40. The true function is the dotted 
curve. 
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Proposition 4. For any y G M", there is a unique J C {1, . . . ,?7i} so 
that y can be written uniquely as 

n—m—r 

(7) y = v + Y,bjS^ + ^.^3^^^ + ^i"^^' 

jf^J iiJ i=i 

where v £V, bj > for j & J and bj > for j ^ J. 

Let Xoi = {SSEq — SSEij/a"^, where SSEq is the residual sum of squares 
under the hypothesis and SSEi is that under Ha'-O G C. To derive 
the distribution of Xoi under Hq, we need to define "sectors" Cj, for all 
J C {1, . . . ,m}. We have 

{n—m—r 
y:y = v + ^bjS^ +^bjy + ^ CjW 
jeJ iiJ j=i 

where v , bj > for j J and bj > for j ^ J. By Proposition 4, these 
2^" sectors are disjoint and cover M". Note that the interior of the constraint 
set is itself a sector with J = {1, . . . , m}. We write 

P(Xoi<«)= E ^(Xoi<«|yeCj)P(yeCj). 

subsets J 

The derivation of Theorem 1 of Meyer (2003) applies, to show that under 
Hq, P{xoi ^ o.\y ^ C'j) = P{x'^{n — d — r) <a), where x^{k) is a chi-square 
random variable with k degrees of freedom, and d is the number of indices 
in J. Therefore, 

m 

Pixli < a) = E P(xHn -d-r)< a)P{D = d), 

(1=0 

where D is a random variable indicating the number of S generators corre- 
sponding to the sector in which the data vector y falls, that is, the size of 
the set J. 

The mixing distribution parameters P{D = d) are found numerically as 
described in Meyer (2003). If the model variance cr^ is not known, the test 
statistic 

D Xoi SSEq - SSEi 

°' ~ Xli + SSEi/a^ ~ SSEo 
has under Hq a mixture of beta densities 

m 

P{Boi<a) = Y,P 

(1=0 

where B{p,q) is a beta random variable with parameters p and q. Because 
the mixing distribution can be determined as precisely as desired, the dis- 
tribution of the test statistic under the null hypothesis is known exactly. 



-I I- 



< a 



P{D = d), 
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Onion example. The example of Figure 2 involves estimating yield as a 
function of planting density for onions. Suppose the farmer would prefer to 
use the simple linear relationship. Is there enough evidence to show that this 
is incorrect? The test for linear versus convex regression function provides a 
value of 0.0047 for two interior knots, p = 0.0036 for three, and p = 0.0037 
for four. 

Simulations. The power for the tests is compared with tests using the 
standard shape-restricted regression estimators, and also with the F-test. To 
compare tests of monotone versus increasing regression function, we choose 
two underlying regression functions (linear and "ramp"), three sample sizes, 
and for each sample size, three model standard deviations. The "ramp" 
regression function is f{x) = exp[8(a; — 1/2)], so that it is flat at the left 
and increasing steeply at the right. For each sample size, the three model 
standard deviations were chosen so that the power of the F-test was 0.25, 
0.50 and 0.75. The alternative fit for the F-test is a line, so that the model 
is not correct for the data generated from the ramp function. The idea is 
that practitioners might not know the true regression function, and might 
use the simplest choice of parametric function for the alternative fit in a 
"test for trend." 

For each combination of sample size, model standard deviation, and un- 
derlying regression function, 10,000 datasets were simulated. For each dataset, 
three hypothesis tests were performed: the -F-test for constant versus linear 
regression function, for constant versus increasing regression function using 
the monotone quadratic (IQRS), and the test for constant versus increas- 
ing regression function using the standard monotone regression estimator 
(MREG). 

For the linear regression function, the F-test is the gold standard. We 
see that the tests using nonparametric alternative fits have smaller power, 
but the regression spline versions have power that is considerably closer 
to that of the F-test. For the ramp regression function, the power for the 
nonparametric tests is larger, and again the regression spline version has 
higher power than the standard version. 

The next set of simulations compares the power for the linear versus con- 
vex regression function tests. The F-test uses the quadratic as the alternative 
fit, the "OCRS B-test" uses the convex piecewise cubic regression spline as 
the alternative fit, and "GREG B-test" uses the standard convex regression 
estimator. The two underlying regression functions are the quadratic and 
the "ramp" functions. Again, results are reported for combinations of un- 
derlying regression function, three sample sizes and three model standard 
deviations. For the quadratic underlying regression function, the F-test has 
highest power, and the power for the test using regression splines has higher 
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Table 3 

Power comparisons for the test of constant vs. monotone regression function. For the 
regression spline, the number of interior knots is 1 = 2, 2, and 3 corresponding to n = 20, 
40 and 80. The results for the tests using the ordinary shape-restricted regression 
estimators are labeled as MREG 



Linear regression function "Ramp" regression function 

n F-test IQRS B-test MREG B-test n F-test IQRS B-test MREG B-test 



20 


0.25 


0.22 


0.22 


20 


0.50 


0.45 


0.44 


20 


0.75 


0.69 


0.68 


40 


0.25 


0.22 


0.21 


40 


0.50 


0.45 


0.41 


40 


0.75 


0.70 


0.66 


80 


0.25 


0.21 


0.20 


80 


0.50 


0.44 


0.41 


80 


0.75 


0.69 


0.65 



20 


0.25 


0.28 


0.27 


20 


0.50 


0.63 


0.61 


20 


0.75 


0.90 


0.88 


40 


0.25 


0.30 


0.28 


40 


0.50 


0.62 


0.59 


40 


0.75 


0.89 


0.86 


80 


0.25 


0.28 


0.27 


80 


0.50 


0.60 


0.56 


80 


0.75 


0.88 


0.84 



Table 4 

Power comparisons for the test of linear vs convex regression function. For the spline, 
the number of interior knots is 1 = 2, 2 and 3 corresponding to n = 20, 40 and 80. The 
results for the tests using the ordinary shape-restricted regression estimators are labeled 

as GREG 



Quadratic regression function "Ramp" regression function 

n F-test CQRS B-test CREG B-test n F-test CQRS B-test CREG B-test 



20 


0.25 


0.22 


0.212 


20 


0.50 


0.46 


0.42 


20 


0.75 


0.70 


0.66 


40 


0.25 


0.22 


0.20 


40 


0.50 


0.45 


0.40 


40 


0.75 


0.70 


0.64 


80 


0.25 


0.22 


0.18 


80 


0.50 


0.44 


0.38 


80 


0.75 


0.69 


0.62 



20 


0.25 


0.25 


0.24 


20 


0.50 


0.54 


0.52 


20 


0.75 


0.80 


0.78 


40 


0.25 


0.26 


0.24 


40 


0.50 


0.53 


0.49 


40 


0.75 


0.81 


0.76 


80 


0.25 


0.25 


0.21 


80 


0.50 


0.54 


0.49 


80 


0.75 


0.80 


0.75 



power than for the ordinary convex regression estimator. For the ramp re- 
gression function, the test using the regression sphnes again has the highest 
power. 

4. Degrees of freedom and model variance. Hastie and Tibshirani (1990) 
suggest an "effective error degrees of freedom" for a linear smoother: if the 
estimator is y = Sy, then df^"^"^ = n — tr(2S' — SS'^). This can be used to 
estimate the model variance. This definition is consistent with ordinary least- 
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squares regression, where S is a projection matrix and, hence, 2S — SS^ = S 
and tr(S) is the dimension of the hnear space defined by the model. 

For ordinary shape-restricted regression and for shape-restricted regres- 
sion sphnes, the estimator is a mixture of hnear estimators, with the mixing 
distribution corresponding to the probabihties of the data vector falhng in 
the sectors. Let D he r plus the number of edges of the face of the cone on 
which the projection falls, and let d be the realized value, so that a candidate 
for error degrees of freedom is n — d. Meyer and Woodroofe (2000) showed 
that the model variance estimate SSE/{n — d) is too small; in fact, 



(8) 



n 



2EiD)<^<n 



E{D), 



where D is the random variable indicating the dimension of J plus r. They 
recommend SSE/{n — l.bd) to estimate the model variance for ordinary 
monotone regression. Further, for monotone regression, they showed that 
E{D) = Op{n^l^). 

For shape-restricted regression splines, E[D) is limited by the number of 
knots. If the number of knots grows slowly, as n^l^'^P^^^ for example, then 
the model dimension is small compared to the ordinary shape-restricted 
regression. We may write the SSE as 



(9) 



\y 





2 


n—m—r 




+ 











where the last term may be written as + • • • + a^-m-r if the are chosen 
to form an orthonormal set. The are mean-zero, independent normal ran- 
dom variables with common variance cj^ , and because the are orthogonal 



Table 5 

Comparison of estimators of model standard deviation. The percent bias and standard 
deviation of four estimators are shown: MQRS— Monotone Quadratic Regression Spline, 
MQRS (cons) = conservative version, M-W = Meyer-Woodroofe estimator, 
MLE— Maximum Likelihood Estimator. Results from 10,000 simulated data sets with 
f{x) = ; two significant figures presented 



MQRS MQRS (cons) M-W MLE 

n cr % bias std dev % bias std dev % bias std dev % bias std dev 



20 


0.1 


-0.66 


0.0036 


3.8 


0.0038 


26 


0.0086 


-65 


0.0019 


40 


0.1 


-0.43 


0.0024 


1.7 


0.0025 


8.1 


0.0035 


-49 


0.0016 


80 


0.1 


-0.22 


0.0016 


0.55 


0.0017 


3.3 


0.0020 


-35 


0.0013 


20 


1.0 


-1.9 


0.34 


12 


0.39 


-5.6 


0.36 


-28 


0.27 


40 


1.0 


-0.77 


0.24 


6.8 


0.25 


-1.5 


0.25 


-18 


0.21 


80 


1.0 


-0.55 


0.16 


2.6 


0.17 


-0.52 


0.17 


-12 


0.15 
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to S, their distributions are independent of the constraint y € Cj. Therefore, 

2 



(10) 



n—m—r 



0"^ ~ X^in — m — r). 



We can show that a result similar to (8) holds for the shape-restricted re- 
gression splines, so that 

II y -d rft-VlP 

m - 2E(D) < "^^-^^^ " <m- E(D), 

where D is the size of J. If the shape restrictions are strictly held (i.e., the 
function is strictly monotone), the constraints are more likely to be unbind- 
ing as n gets larger. The value of D for shape-restricted regression splines 
tends to be closer to m, so that m — 2E{D) may be negative. Therefore, we 
choose as the model variance estimator 

(11) a' - 



n 



where d is the size of the realized J plus r. This estimator tends to have 
a small negative bias that decreases as n gets larger. An alternative con- 
servative estimator is cj^ = SSE /{n — m); this over-estimates the variance 
because the denominator is too small. 

Simulations show that, for the monotone assumptions, the proposed es- 
timator has smaller bias than the MLE and smaller bias and variance than 
that using the Meyer- Woodroofe estimator and ordinary monotone regres- 
sion, for small to moderate sample sizes and several choices of model vari- 
ance. In Table 5 four estimators of model standard deviation for monotone 
regression are compared. For each combination of choices of n and a, 10,000 
datasets were simulated using a linear regression function and i.i.d. normal 
errors. The MQRS uses increasing quadratic regression spline with variance 
estimate SSE/{n — d), the MQRS (cons) is SSE /{n — m), and M-W uses 
standard shape-restricted regression with variance estimate SSE /{n — l.Sd). 
Finally, the MLE uses standard monotone regression and variance estimate 
SSE /n. The percent bias is the difference between the mean of the com- 
puted estimator and the true cr, divided by the true a. Simulated estimates 
of the bias and standard deviation of the estimator of the model variance in 
convex regression are shown in Table 6, and compared with the MLE. 

APPENDIX 

Proof of Proposition 4. Given J C {l, . . . , m), there are four orthog- 
onal linear subspaces oiW^:V, C{{S^,j G J}), C{{'~f^,j ^ J}), and C{{w^,j = 
1, . . . ,n — m — r}). Because the dimensions of these subspaces add to n, any 
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Table 6 

Comparison of the estimator of model standard deviation using regression spline for a 
convex regression function, compared with the maximum likelihood variance estimator. 
The underlying regression function is f{x) =x^. Results from 10,000 simulated data sets; 

two significant figures presented 



n 


cr 




CQRS 


CQRS ( 


cons) 




MLE 


% bias 


std dev 


% bias 


std dev 


% bias 


std dev 


20 


0.1 


-2.8 


0.018 


11. 


0.0039 


-32 


0.0026 


40 


0.1 


-1.0 


0.012 


2.9 


0.0024 


-18.6 


0.0020 


80 


0.1 


-0.64 


0.0082 


0.81 


0.0017 


-10.9 


0.0015 


20 


1.0 


-2.4 


0.17 


12.1 


0.39 


-20 


0.29 


40 


1.0 


-0.89 


0.12 


7.1 


0.25 


-12 


0.21 


80 


1.0 


0.58 


0.081 


4.1 


0.17 


-6.7 


0.15 



y £ can be written as the sum of the projections onto these four sub- 
spaces. Let 9 = V + ^j^jbjS-' be the unique vector in C that minimizes 
\\y — 0|p with bj > for j € J. By Proposition 1, we have e = y — 6 is in the 
space spanned by the and the 7-',j ^ J, so we write e as a hnear com- 
bination of these. Now we need only show that the coefficients bj of the -y^ 
are nonnegative. This is easy by (6) because, for any j ^ J, {y — 0,S^) < 0, 
and 

(n~m—r \ 
l(^J 1=1 I ^ 

SUPPLEMENTARY MATERIAL 

R code: Supplement 1 (DOL 10.1214/08-AOAS167SUPPA). Performs a 
weighted monotone piecewise quadratic spline least-squares regression, in- 
puts: scatterplot points where the x are sorted and distinct, weights 
w must be positive, var(yi) = Wi k is the number of interior knots. These 
will be placed at approximately equal x-quantiles output: the values of the 
fit at the observed x values. 

R code: Supplement 2 (DOL 10.1214/08-AOAS167SUPPB). Performs a 
weighted convex piecewise cubic spline least-squares regression, inputs: scat- 
terplot points {x,y) where the x are sorted and distinct, weights w must be 
positive, var(yj) = Wi k is the number of interior knots. These will be placed 
at approximately equal x-quantiles output: the values of the fit at the ob- 
served x values. 

R code: Supplement 3 (DOL 10.1214/08-AOAS167SUPPC). Performs a 
weighted monotone convex piecewise cubic spline least-squares regression. 
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inputs: scatterplot points {x,y) where the x are sorted and distinct, weights 
w must be positive. var(?/j) = Wi k is the number of interior knots. These 
will be placed at approximately equal x-quantiles output: the values of the 
fit at the observed x values. 

R code: Supplement 4 (DOT: 10.1214/08-AOAS167SUPPD). Fit two par- 
allel monotone piecewise quadratic curves to a scatterplot. inputs: scatter- 
plot points {x,d,y) where the x are sorted and distinct and d is a vector of 
ones and zeros, k is the number of interior knots. These will be placed at 
approximately equal x-quantiles outputs: the values of the fit for d = at 
the observed x values and the increase in intercept for d=l. 
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